The SAIL databank: linking multiple health and social care datasets

نویسندگان

  • Ronan A. Lyons
  • Kerina H. Jones
  • Gareth John
  • Caroline J. Brooks
  • Jean-Philippe Verplancke
  • David V. Ford
  • Ginevra Brown
  • Ken Leake
چکیده

BACKGROUND Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage) databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress. METHODS Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF) to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage) was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR), to assess the efficacy of this process, and the optimum matching technique. RESULTS The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL) at the 50% threshold, and error rates were < 0.2%. A range of techniques for matching datasets to the NHSAR were applied and the optimum technique resulted in sensitivity values of: 99.9% for a GP dataset from primary care, 99.3% for a PEDW dataset from secondary care and 95.2% for the PARIS database from social care. CONCLUSION With the infrastructure that has been put in place, the reliable matching process that has been developed enables an ALF to be consistently allocated to records in the databank. The SAIL databank represents a research-ready platform for record-linkage studies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Suicide Information Database-Cymru: a protocol for a population-based, routinely collected data linkage study to explore risks and patterns of healthcare contact prior to suicide to identify opportunities for intervention

INTRODUCTION Prevention of suicide is a global public health challenge extending beyond mental health services. Linking routinely collected health and social care system data records for the same individual across different services and over time has enormous potential in suicide research. Most previous research linking suicide mortality data with routinely collected electronic health records i...

متن کامل

The SAIL Databank: building a national architecture for e-health research and evaluation

BACKGROUND Vast quantities of electronic data are collected about patients and service users as they pass through health service and other public sector organisations, and these data present enormous potential for research and policy evaluation. The Health Information Research Unit (HIRU) aims to realise the potential of electronically-held, person-based, routinely-collected data to conduct and...

متن کامل

Development of an algorithm for determining smoking status and behaviour over the life course from UK electronic primary care records

BACKGROUND Patients' smoking status is routinely collected by General Practitioners (GP) in UK primary health care. There is an abundance of Read codes pertaining to smoking, including those relating to smoking cessation therapy, prescription, and administration codes, in addition to the more regularly employed smoking status codes. Large databases of primary care data are increasingly used for...

متن کامل

Case-finding for common mental disorders of anxiety and depression in primary care: an external validation of routinely collected data

BACKGROUND The robustness of epidemiological research using routinely collected primary care electronic data to support policy and practice for common mental disorders (CMD) anxiety and depression would be greatly enhanced by appropriate validation of diagnostic codes and algorithms for data extraction. We aimed to create a robust research platform for CMD using population-based, routinely coll...

متن کامل

P7: Dimensions of Adaptation, General Health, and Life Satisfaction in Multiple Sclerosis

Multiple Sclerosis (MS) is a debilitating disease which can affect general health and life' satisfaction. This study aimed to determine dimensions of adaptation, general health, and life satisfaction in MS patients. This study was a cross-sectional that samples were selected from MS patients in 2015. Data was collected by using a demographic questionnaire, Roy Adaptation Model (RAM), General He...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2009